Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

Q-ViT: Accurate and Fully Quantized Low-Bit Vision Transformer

FIGURE 2.6

Attention-distance comparison for full-precision DeiT-Small, fully quantized DeiT-Small

baseline, and Q-ViT for the same input. Q-ViT shows similar behavior with the full-precision

model, while the baseline suﬀers indistinguishable attention distance for information degra-

dation.

full-precision counterparts as much as possible; thus, the mutual information between quan-

tized and full-precision representations [195]. As shown in [171], for the Gaussian distribu-

tion, the quantizers with the maximum output entropy (MOE) and the minimum aver-

age error (MAE) are approximately the same within a multiplicative constant. Therefore,

minimizing the error between the full precision and the quantized values is equivalent to

maximizing the information entropy of the quantized values. Thus, when the deterministic

quantization function is applied to quantized ViT, this objective is equivalent to maximiz-

ing the information entropy H(Qx) of the quantized representation Qx [171] in Eq.(2.16),

Block.0.query

Block.3.query

Block.6.query

(b) Q-ViT

Block.0.query

Block.3.query

Block.6.query

(a) Full-Precision

FIGURE 2.7

The histogram of query and key values q, k (shadow) along with the PDF curve of Gaussian

distribution N(μ, σ²) [195], for three selected layers in DeiT-T and 4-bit Q-ViT. μ and σ²

are the statistical mean and variance of the values.